172 research outputs found
Exploring regulatory fit between service relationships and appeals in co-production
Acknowledgements This work was supported by National Natural Science Foundation of China [grant numbers 72072135; 71872140; 71772141; 71632001] and “the Fundamental Research Funds for the Central Universities” in UIBE [grant number 20QD16].Peer reviewedPostprin
Snipper: A Spatiotemporal Transformer for Simultaneous Multi-Person 3D Pose Estimation Tracking and Forecasting on a Video Snippet
Multi-person pose understanding from RGB videos involves three complex tasks:
pose estimation, tracking and motion forecasting. Intuitively, accurate
multi-person pose estimation facilitates robust tracking, and robust tracking
builds crucial history for correct motion forecasting. Most existing works
either focus on a single task or employ multi-stage approaches to solving
multiple tasks separately, which tends to make sub-optimal decision at each
stage and also fail to exploit correlations among the three tasks. In this
paper, we propose Snipper, a unified framework to perform multi-person 3D pose
estimation, tracking, and motion forecasting simultaneously in a single stage.
We propose an efficient yet powerful deformable attention mechanism to
aggregate spatiotemporal information from the video snippet. Building upon this
deformable attention, a video transformer is learned to encode the
spatiotemporal features from the multi-frame snippet and to decode informative
pose features for multi-person pose queries. Finally, these pose queries are
regressed to predict multi-person pose trajectories and future motions in a
single shot. In the experiments, we show the effectiveness of Snipper on three
challenging public datasets where our generic model rivals specialized
state-of-art baselines for pose estimation, tracking, and forecasting
- …